Campus Placement Analyzer: Using Supervised Machine Learning Algorithms
Campus Placement Analyzer: Using Supervised Machine Learning Algorithms
Abstract -- The main aim of every academia enthusiast is placement in a reputed MNC’s and even the reputation and every year
admission of Institute depends upon placement that it provides to their students. So, any system that will predict the placements of the
students will be a positive impact on an institute and increase strength and decreases some workload of any institute’s training and
placement office (TPO). With the help of Machine Learning techniques, the knowledge can be extracted from past placed students and
placement of upcoming students can be predicted. Data used for training is taken from the same institute for which the placement
prediction is done. Suitable data pre-processing methods are applied along with the features selections. Some Domain expertise is used
for pre-processing as well as for outliers that grab in the dataset. We have used various Machine Learning Algorithms like Logistic,
SVM, KNN, Decision Tree, Random Forest and advance techniques like Bagging, Boosting and Voting Classifier and achieved 78%
in XGBoost and 78% in AdaBoost Classifier.
Keywords: Pre-processing, Feature Selection, Domain expertise, Outliers, Bagging, Boosting, SVM, KNN, Logistics
1. INTRODUCTION
Nowadays Placement plays an important role in this world
accuracy of 71.66% with tested real-life data indicates that the
full of unemployment. Even the ranking and rating of
institutes depend upon the amount of average package and system is reliable for carrying out its major objectives, which
amount of placement they are providing. is to help teachers and placement cell[2].
So basically main objective of this model is to predict whether
the student might get placement or not. Different kinds of Ajay Kumar Pal, Saurabh Pal (2013) they are predicting the
classifiers were applied i.e. Logistic Regression, SVM, placement of student after doing MCA by the three selected
Decision Tree, Random Forest, KNN, AdaBoost, Gradient classification algorithms based on Weka. The best algorithm
Boosting and XGBoost. For this all over academics of based on the placement data is Naïve Bayes Classification
students are taken under consideration. As placements activity with an accuracy of 86.15% and the total time taken to build
take place in last year of academics so last year semesters are the model is at 0 seconds. Naïve Bayes classifier has the
not taken under consideration lowest average error at 0.28 compared to others.[3]
www.ijcat.com 358
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 358-362, 2019, ISSN:-2319–8656
different status i.e. Dream company, Core Company, Mass we have merged 12th and diploma marks and made
recruiters, Not eligible and Not interested[5] a single column for both.
Some of the tuples where from M.tech background
so we have dropped them and even in
3. DATASET DESCRIPTION AND “current_aggregate” column we have dropped the
SYSTEM FLOW NA values because the whole row was having NA.
Replaced all NA values in columns
“Current_Back_Papers”,
This approach was followed in following Figure 3. “Current_Pending_Back_Papers”, all semester wise
“Sem_Back_Papers”, “Sem_Pending_Back_Papers”
with 0 because it was null only if that student have
no backlogs
Data Gathering Using LabelEncoder from Preprocessing API in
sklearn encoded the labels of columns
“'Degree_Specializations”, “Campus”,” Gender”,
“year_down”, “educational_gap”
Pre-processing
3.2 Feature Selection
As per machine learning Feature Selection algorithms like
“Ridge”, “Lasso”, “RFE”, “plot importance”, “F1 score” and
“feature importance” we have got various outputs
Feature selection
“Feature importance” with DT
Training different
Model
Model Selection
Prediction
www.ijcat.com 359
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 358-362, 2019, ISSN:-2319–8656
“Ridge”
“F1 score”
Sem4_Aggregate_Marks 312.063809
Current_Aggregate_Marks 286.086537
Sem2_Aggregate_Marks 164.183078
12th_/_Diploma_Aggre_marks 142.208129
Sem1_Aggregate_Marks 139.183936
Figure 3.2.3 Feature Selection Using Ridge
Sem6_Aggregat_ Marks 136.333959
Sem5_Aggregate_Marks 131.988165
10th_Aggregate_Marks 128.526784
Sem6_Back_Papers 128.526784
live_atkt 47.908927
Sem5_Back_Papers 45.382049
Sem4_Back_Papers 43.547352
www.ijcat.com 360
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 358-362, 2019, ISSN:-2319–8656
“Lasso”
www.ijcat.com 361
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 358-362, 2019, ISSN:-2319–8656
Figure 5.1 Layering of Classifiers [2] Senthil Kumar Thangavel, Divya Bharathi P, Abijith
Sankar “Student Placement Analyzer: A Recommendation
System Using Machine Learning” 2017 International
We have used Base Classifier as Decision Tree, over that we Conference on Advanced Computing and Communication
have used AdaBoost Classifier and over that we have used Systems (ICACCS -2017), Coimbatore, INDIA, Jan. 06 – 07,
Baagging Classifier because we want to tune the accuracy of 2017
the model
[3] Ajay Kumar Pal, Saurabh Pal “Classification Model of
Prediction for Placement of Students” I.J.Modern Education
and Computer Science, 2013, 11, 49-56 Published Online, 11
6. RESULT AND CONCLUSION November 2013
AdaBoost(DT) 77%
XGBoost 78%
www.ijcat.com 362