Fake Job Detection Using Machine Learning
Fake Job Detection Using Machine Learning
https://doi.org/10.22214/ijraset.2022.41641
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue IV Apr 2022- Available at www.ijraset.com
Abstract: The research proposes an automated solution based on machine learning-based classification approaches to prevent
fraudulent job postings on the internet. Many organizations these days like to list their job openings online so that job seekers
may access them quickly and simply. However, this could be a form of scam perpetrated by con artists who offer job seekers
work in exchange for money. Many people are duped by this fraud and lose a lot of money as a result. We can determine which
job postings are fraudulent and which are not by conducting an exploratory data analysis on the data and using the insights
gained. In order to detect bogus posts, a machine learning approach is used, which employs numerous categorization
algorithms. The system would train the model to classify jobs as authentic or false based on previous data of bogus and
legitimate job postings. To start, supervised learning algorithms as classification techniques can be considered to handle the
challenge of recognizing scammers on job postings. It will employ two or more machine learning algorithms, selecting the one
that yields the highest accuracy score in the prediction of whether a job advertising headline is genuine or not.
Keywords: Fake Job, Online Recruitment, Machine Learning, Ensemble Approach.
I. INTRODUCTION
For many people, economic hardship and the impact of the coronavirus have drastically reduced work availability and resulted in
job loss. Scammers would love to take advantage of a situation like this. Many individuals are falling prey to these con artists who
are preying on people's desperation as a result of an extraordinary event. The majority of fraudsters do this to obtain personal
information from the person they are attempting to defraud. Addresses, bank account numbers, and social security numbers are
examples of personal information. Scammers provide customers with a fantastic job offer and then demand money in exchange.
Alternatively, they may need a financial investment from the job seeker in exchange for the promise of a job. Because of
unemployment, there are a lot of job scams these days.
A recruiter can find a qualified candidate through a variety of websites. Fake recruiters will sometimes post a job on a job platform
for the sole purpose of making money. Many job boards suffer from this issue. People later go to a new job portal in quest of
legitimate employment, but phoney recruiters also migrate to this portal. As a result, it's critical to distinguish between legitimate
and fictitious employment opportunities. Employment fraud is one of the most severe concerns that has been addressed in the arena
of Online Recruitment Frauds in recent years (ORF). Many organizations these days like to list their job openings online so that job
seekers may find them quickly and simply. This could, however, be one form of fraud perpetrated by the con artist. However, this
could be a form of scam perpetrated by con artists who offer job seekers work in exchange for money.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1822
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue IV Apr 2022- Available at www.ijraset.com
This is a dangerous problem that can be solved using machine learning and natural language processing approaches (NLP). In order
to detect bogus posts, a machine learning approach is used, which employs numerous categorization algorithms. In this scenario, a
classification technique distinguishes bogus job postings from a wider pool of job postings and notifies the user. To start, supervised
learning algorithms as classification techniques are being studied to address the challenge of recognizing scammers on job postings.
A classifier uses training data to map input variables to target classes. The paper's classifiers for distinguishing phoney job postings
from the others are briefly presented. These classifier-based predictions can be divided into two categories: single-classifier
predictions and ensemble-classifier predictions.
A. Naive Bayes
The number of parameters required for Naive Bayes classifiers is linear in the number of variables in a learning issue, making them
extremely scalable. Instead of expensive iterative approximation, which is used for many other types of classifiers, maximum-
likelihood training can be done simply evaluating a closed-form expression in linear time. Naive Bayes is a straightforward method
for building classifiers, which are models that give class labels to problem cases represented as vectors of feature values, with the
class labels selected from a limited set. The amount of information loss of the class due to the independence assumption is needed to
estimate the accuracy of this classifier, not feature dependencies.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1823
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue IV Apr 2022- Available at www.ijraset.com
C. Logistic Regression
It's a categorical response variable that's employed in a classification process. E.g. When the number of hours spent studying is
supplied as a feature in predicting whether a student passes or fails an exam, the response variable has two values: pass and fail.
Binomial Logistic Regression is a form of issue in which the response variable has two values: 0 and 1, or pass and fail, or true and
false. When the response variable can have three or more potential values, Multinomial Logistic Regression is used.
V. PROPOSED METHODOLOGY
This system's main purpose is to identify whether a job posting is genuine or not. Job seekers will be able to focus entirely on
legitimate job openings if fake job postings are identified and deleted. In this system, we plan to use a Kaggle dataset that contains
information on the job, including attributes such as job id, title, location, and department. Then there's data preprocessing, which
involves removing things like trivial spaces, null entries, stopwords, and so on. The data is provided to the classifier for predictions
after it has been preprocessed and cleaned to make it prediction ready.
A. Dataset Details
This kaggle dataset contains 17,880 job posting data entries. We must first preprocess this data in order to prepare it for prediction
before fitting it into any of the machine learning models or classifiers. Some pre-processing techniques are used on this dataset
before it is fitted to any classifier. Missing values removal, stop-words removal, irrelevant attribute removal, and unnecessary space
removal are some of the pre-processing strategies. This prepares the dataset for categorical encoding, which will be used to generate
a feature vector.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1824
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue IV Apr 2022- Available at www.ijraset.com
C. Feature Extraction
Feature extraction is a step in the dimensionality reduction process, which divides and reduces a large set of raw data into smaller
groupings. The fact that these enormous data sets have a large number of variables is the most crucial feature. To process these
variables, a large amount of computational power is required. So, by selecting and merging variables into features, feature extraction
aids in extracting the best feature from those large data sets, effectively lowering the amount of data. These features are simple to
utilize while still characterizing the underlying data set precisely and uniquely. TFI-DF is utilized to extract features in this
investigation. The TF-IDF (Term Frequency-Inverse Document Frequency) statistic is a numerical measure of how essential a term
is to a document in a corpus or collection.
,
,
The IDF (Inverse Document Frequency) is a metric for determining the importance of a phrase. We need the IDF value since just
computing the TF isn't enough to grasp the significance of words:
=
ℎ ′ ′
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1825
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue IV Apr 2022- Available at www.ijraset.com
D. Implementation of Classifiers
In this section, proper parameters are used to train classifiers. For predictions, this framework used Logistic Regressor, SVM, and
Naive Bayes models. SVM has a number of distinguishing characteristics, as a result of which it has gained notoriety and has shown
promising experimental results. To partition the data points, SVM constructs a hyper level in authentic input space.
While Naive Bayes predicts the probability of different classes based on data, logistic regressors estimate probabilities using a
logistic regression equation to determine the relationship between the dependent variable and one or more independent variables.
The Random forest ensemble classifier was utilized as the classification algorithm, and it was built using a collection of tree-
structured Classifiers.
The boosting is terminated on this random forest model, which was generated on 100 numbers of estimators. Following the
construction of these classification models, the training dataset is utilised to make predictions, and then the performance is
evaluated.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1826
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue IV Apr 2022- Available at www.ijraset.com
A. Flask
Flask is a Python-based web application framework. It features a number of modules that make it easy for a web developer to
construct apps without having to worry about protocol management, thread management, and other such concerns.
B. Heroku
Heroku is a cloud platform that supports several programming languages in which we can deploy our applications.
VIII. CONCLUSIONS
Only reputable business offers will be sent to you. Several machine learning methods are proposed for detecting employment scams.
In this work, we discuss counter measures. Supervised mechanism is used to demonstrate the utilization of many mechanisms.
Classifiers for detecting job scams. The results of the experiments show that Random Forest is effective. The classifier exceeds its
peers in classification. The proposed method had a 97 percent accuracy rate. Which is significantly greater than current approaches.
REFERENCES
[1] S. Anita, P. Nagarajan, G. A. Sairam, P. Ganesh, and G. Deepakkumar, “Fake Job Detection and Analysis Using Machine Learning and Deep Learning
Algorithms,” Rev. GEINTECGESTAO Inov. E Tecnol., vol. 11, no. 2, pp. 642–650, 2021.
[2] B. Alghamdi and F. Alharby, “An intelligent model for online recruitment fraud detection,” J. Inf. Secur., vol. 10, no. 03, p. 155, 2019.
[3] “Report | Cyber.gov.au.” https://www.cyber.gov.au/acsc/report (accessed Jun. 19, 2021).
[4] A. Pagotto, “Text Classification with Noisy Class Labels.” Carleton University, 2020.
[5] “Employment Scam Aegean Dataset.” http://emscad.samos.aegean.gr/ (accessed Jun. 19, 2021).
[6] S. Vidros, C. Kolias, G. Kambourakis, and L. Akoglu, “Automatic detection of online recruitment frauds: Characteristics, methods, and a public dataset,” Futur.
Internet, vol. 9, no. 1, p. 6, 2017.
[7] S. Lal, R. Jiaswal, N. Sardana, A. Verma, A. Kaur and R. Mourya, "ORFDetector: Ensemble Learning Based Online Recruitment Fraud Detection," 2019
Twelfth International Conference on Contemporary Computing (IC3), 2019, pp. 1-5, doi: 10.1109/IC3.2019.8844879..
[8] Bandyopadhyay, Samir & Dutta, Shawni. (2020). Fake Job Recruitment Detection Using Machine Learning Approach. International Journal of Engineering
Trends and Technology. 68. 10.14445/22315381/IJETT-V68I4P209S.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1827