Comparison of The Performance of GaussianNB Algorithm, The K Neighbors Classifier Algorithm
Comparison of The Performance of GaussianNB Algorithm, The K Neighbors Classifier Algorithm
[Link]
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XII Dec 2022- Available at [Link]
Abstract: Most educational institutions worldwide have been closed since March 2020 in an effort to slow the spread of the
Covid-19 epidemic. More than 90% of students around the world have been influenced by this. In this study, we'll make a
prediction about whether or not the Covid-19 epidemic has benefited student performance.
Our data will be divided into training and testing datasets, with 80% of the data utilised for training and 20% for testing.
To calculate the accuracy of our predictions, we'll use six different algorithms, including the RandomForestClassifier Algorithm,
the GaussianNB Algorithm, the K Neighbors Classifier Algorithm, the Logistic Regression Algorithm, the Linear Discriminant
Analysis Algorithm, and the DecisionTree Classifier Algorithm.
Keywords: predictive analytics, GaussianNB Algorithm, the K Neighbors Classifier Algorithm, the Logistic Regression
Algorithm, the Linear Discriminant Analysis Algorithm, and the DecisionTree Classifier Algorithm
I. INTRODUCTION
A. RandomForestClassifier: Suitable for Binary, Continuous and categorical data type.
The Random Forest Algorithm consists of several decision trees on various subsets of a given dataset. Based on the concept of
ensemble learning process, it creates decision trees based on data samples. It gets the prediction from each of them and selects the
best solution by means of voting.
1) Advantages
Reduces risk of over fitting and the required training time.
Runs efficiently in large database while producing highly accurate predictions by estimating missing data.
2) Disadvantages
Compared to a decision tree, it is slower.
It requires significant memory for storage due to the need for retaining the information from several hundred individual trees.
1) Advantages
No problems scaling input features and does not require tuning.
It is highly interpretable and does not require too many computational resources.
Easy to implement and train a model using Logistic Regression.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1654
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XII Dec 2022- Available at [Link]
2) Disadvantages
Constructs Linear Boundaries.
It’s inefficient when the number of observations are lesser than the number of features. Which can lead to overfitting.
2) Disadvantages
Requires normal distribution assumption on features/predictors.
Sometimes not good for few categories variables.
2) Disadvantages
The Value of k must always be determined which can be complex.
Calculating the distance between the data points for all the training samples results in a higher computation cost.
Size of the model grows with new data incorporated.
It is a distance based-approach hence the model can be badly affected by outliers. Making it prone to Overfitting.
E. DecisionTree
A decision tree is a graphical representation for getting all the possible solutions to a problem/decision based on given conditions.
They can be used in both regression and classification tasks. A decision tree comprises of two nodes, a decision node and Leaf Node.
Decision nodes are used to make any decision and have multiple branches, whereas Leaf Nodes are the output of these decisions.
1) Advantages
Decision Trees mimics the human thinking ability while making a decision, which makes it easy to understand/interpret.
There is less requirement of data cleaning compared to other algorithms.
2) Disadvantages
The Decision tree contains lots of layers, which makes it complex.
It may have Overfitting Issues.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1655
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XII Dec 2022- Available at [Link]
1) Advantages
Fast and Flexible Model that works well with Large data.
No need to spend much time for training.
2) Disadvantages
Large data records are required to achieve good results.
Shows lower performance than the other classifiers according to the type of problem.
A. Data Collection
We gathered the information by sending out questionnaires via Google Forms to our Goa-based students, friends, family, and other
well-wishers.
B. Data Representation
Email address, name, educational level, name of the institution, age, gender, taluka, and a few other columns have sub-questions
make up the total of 36 columns, including the timestamp which is the default.
D. Data Analysis
After cleansing and preprocessing the data, feature selection was done. To achieve the best accuracy, we took 22 columns out of 35
columns. Then, 2 columns—Sum and Final Result—were added. The sum column contains the total count for each row that was
calculated, and the final result column contains the average of all responses for each individual. All of this was done using simply
Excel, where 0 means performance has not increased and 1 means performance has.
E. Data Analytics
To achieve the best accuracy, we used feature selection on our dataset and removed 22 of the dataset's 35 columns. The data was
then divided into training and testing. The Random Forest Classifier technique was then used to create baseline models, and five
other algorithms were employed to assess the accuracy.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1656
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XII Dec 2022- Available at [Link]
.
Fig. 1 Adaptability to online class
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1657
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XII Dec 2022- Available at [Link]
.
Fig. 4 Quality of online Teaching-Learning mechanism
After cleansing and preprocessing the data, feature selection was done. To achieve the best accuracy, we took 22 columns out of 35
columns. Then, 2 columns—Sum and Final Result—were added. For the purpose of determining if the performance has improved
(1) or not, we computed the total count for each row and saved it in the Sum Column (0). The 26–50 range was used to evaluate
performance. By summing the allowable minimum and maximum values for each column, we arrived at this range. By utilising the
constraint that the aggregate must be larger than 26 and less than 50, we were able to determine if the performance had improved or
not (0).
Fig. 6
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1658
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XII Dec 2022- Available at [Link]
On the x-axis of the preceding graph, we can see the Final Result feature, and on the y-axis, the count feature. In this, 0 indicates
that student performance has not improved, while 1 indicates that it has. We can also see that 0 (Not Improved performance) is less,
i.e. 59, while 1 (Improved performance) is above 350 (count) (count).
B. Data Analytics
It makes use of advanced statistical techniques and artificial intelligence to tap the potential of the analysed data to create
predictions about the future. Therefore, using the provided dataset as a basis, we will make predictions in this.
C. Feature Selection
To achieve the best accuracy, we used feature selection on our dataset, removing 22 of the dataset's 35 columns.
We must first divide the data before making a prediction.
Fig. 7
In this code y value that is output or prediction we need to find or make. We are dropping Final_Result.
After this we are splitting the data into training and testing with test_size 0.2 that means 80% training and 20% testing.
After splitting next we are going to test the accuracy of our model using six different algorithms.
E. Building Models
1) So first we are going to see the RandomForestClassifier() algorithm.
Fig. 8
We will utilise the RandomForestClassifier() technique in this code. We store RandomForestClassifire() in the model object, and
then we use ".fit" (x train, y train) to fit the training data into the model. Then, in model predictions, we will forecast the test (x test),
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1659
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XII Dec 2022- Available at [Link]
after which we will print the model's accuracy score and check the y value's accuracy (y test), and finally, we will print the
classification report with both the y test and the model predictions.
We can see from the results that our model is 100% accurate.
The definition of "support" is anytime we say for 0 (Not Improved) 12 times that our answer was accurate and whenever we say for
1 (Improved) 73 times that our answer was correct.
2) Next we are going to predict and see the accuracy of LogisticRegression() algorithm.
Fig. 9
We can infer from the LogisticRegression() algorithm's output that the accuracy score remained constant (100% accuracy).
3) Next we are going to predict and see the accuracy of LinearDiscriminantAnalysis() algorithm.
Fig. 10
By comparing the accuracy score of the LinearDiscriminantAnalysis() algorithm (accuracy 94.11%) to those of the
RandomForestClassifier() algorithm (accuracy 100%) and the LogisticRegression() algorithm (accuracy 100%), we can conclude
that the accuracy score has fallen.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1660
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XII Dec 2022- Available at [Link]
4) Next we are going to predict and see the accuracy of KNeighborsClassifier() algorithm.
Fig. 11
By examining the KNeighborsClassifier() algorithm's output, we may conclude that the accuracy score remained the same (100%
accuracy).
5) Next we are going to predict and see the accuracy of DecisionTreeClassifier() algorithm.
Fig. 12
By examining the DecisionTreeClassifier() algorithm's output, we may conclude that the accuracy score remained the same (100
percent accuracy).
6) Next we are going to predict and see the accuracy of GaussianNB() algorithm.
Fig.13
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1661
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XII Dec 2022- Available at [Link]
The accuracy of the GaussianNB() algorithms has increased when compared to the LinearDiscriminantAnalysis() algorithm, but it
has decreased (by 95.29%) when compared to the RandomForestClassifier() algorithm, LogisticRegression() algorithm, K
NeighborsClassifier() algorithm, and DecisionTreeClassifier() algorithm, all of which have accuracy scores of 100%.
IV. CONCLUSIONS
TABLE I
ALGORITHM ACCURACY
[Link]. ALGORITHMS ACCURACY
1 RandomForestClassifier() Algorithm 100%
2 LogisticRegression() Algorithm 100%
3 LinearDiscriminantAnalysis() Algorithm 94.11%
4 KNeighboursClassifier() Algorithm 100%
5 DecisionTreeClassifier() Algorithm 100%
6 GaussianNB() Algorithm 95.29%
According to the preceding table, all four techniques, with the exception of LinearDiscriminantAnalysis() and GaussianNB(),
provide 100% accuracy. 354 students' grades have increased, whereas 69 students' grades have not.
REFERENCES
[1] Abdelsalam M.M., Ebitisam K.E., Shadi A., Hasan R. &Hadeel A. (2021). The Covid-19 Pandemic And E-Learning: Challenges AND OPPORTUNITIES
from The Perspective Of Students And Instructors. Journal of Computing In Higher Education. [Link]/10.1007/s12528-021-09274-2
[2] Dr. Wahab Ali (2020). Online and remote learning in higher education institutes: A necessity in light of Covid-19 Pandemic. Higher Education Studies. Vol.10,
No.3.
[3] EdyBudiman. (2020). Mobile Data Usage On Online Learning During Covid-19 Pandemic In Higher Education. iJIM. Vol. 14. No. 19.
[4] F. Zheng, N. Abbas Khan, S. Hussain. (2020). The Covid-19 Pandemic And Digital Higher Education: The Impact Of Students’ Proactive Personality On
Social Capital Through Internet Self-Efficacy And Online Interaction Quality. Children And Youth Services Review.
Doi:[Link]
[5] GhadaRefaat El Said. (2021). How Did Covid-19 Pandemic Affect Higher Education Learning Experience? An Empirical Investigation of Learners’ academic
Performance at a University in a Developing Country. Advances in Human-Computer Interaction. Vol. 2021. ID6649524.
[6] Haozhe J., Atiquil A.Y.M., Xiaoqing G. & Jonathan M.S.(2021). Online Learning Satisfaction in Higher Education during the Covid-19 Pandemic: A Regional
Comparison between Eastern and Western Chinese Universities. Education and Information Technologies. [Link]
[7] Joana P., Ariadna L., Frances S., Marc A. & Daniel A. (2021). A Methodology to Study the University’s Online Teaching Activity from Virtual Platform
Indicators: The Effect Of The Covid-19 Pandemic at UniversitatPolitecnia De Catalunya. Sustainability 2021, 13, 5177. [Link]
[8] Lokanath M., Tushar G. &Abha S.(2020). Online teaching-learning in higher education during lockdown period of Covid-19 pandemic. International Journal of
Educational Research Open. 2020. 100012
[9] Maria J.S., Sandro S. (2020). The Covid-19 Pandemic as an Opportunity to Foster the Sustainable Development of Teaching In Higher Education.
Sustainability 2020. 12. 8525; doi:10.3390/su12208525
[10] Marion H., Melanie S., Michaela G., Barbel K., Svenja B., & Albert Z.(2020). Digital readiness and its effect on higher education students’ socio-emotional
perceptions in the context of the Covid-19 pandemic. Journal of Research on Technology in education, DOI:10.1080/15391523.2020.1846147
[11] Monika S., Ashish K., &Gursharan K. (2020). Research Perception, Motivation and Attitude among Undergraduate Students: A Factor Analysis Approach.
Procedia Computer Science. Vol. 167. 185-192
[12] N. Kapasia, P. Paul, A. Roy, J. Saha, A. Zaveri, R. Mallick, B. Barman, P. Das, P. Chouhan. (2020). Impact Of Lockdown On Learning Status Of
Undergraduate And Postgraduate Students During Covid-19 Pandemic In West Bengal, India. Children and Youth Services Review. doi:
[Link]
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1662