Identifying Real and Fake Job Posting using Machine Learning

Main Article Content

Sherina Sara Jaison
Mallikarjuna Kodabagi

Abstract

According to researches, there are around 188 million unemployed people around the globe. We may find many job vacancies on job portals and across the internet to help the job seekers. India alone has more than a hundred job portals. One major issue people face here is that the job seekers are not sure if the employer is real or fake. Most of these portals do not have a system that could check if the employer, posting a job is real or fake. Scammers are making use of this opportunity to post fake job offers which might look genuine to the job seekers applying for it. This way the poor job seekers might lose a large amount of money and time. A best possible solution for this problem would be that the job portal itself being able to identify if the job being posted is real or fake. This paper suggests using a machine learning model to achieve this goal. The idea here is to use natural language processing to understand and analyze the job posting and then making use of a machine learning model to predict if the job posting is real or fake. The first step is to import a dataset which has real life real and fake job posting. In this project, Employment Scam Aegean Dataset provided by University of Aegean Laboratory of Information and Communication system Security is being used. This dataset contains 18000 samples containing real life job postings. Various text cleaning techniques like lemmatization, stop words removal and special characters and punctuation removal is done on the data. Once the text data is processed, various algorithms like Random Forest, Linear SVC, Gradient Boosting Classifier, Gaussian naïve Bayes classifier and XGB classifier is used to test the performance of the model. The best two algorithms with respect to the percentage of accuracy with which  the models could classify real and fake job posting was taken into consideration. Random Forest and Linear SVC could give accuracy close to 98%. Both of these algorithms were tuned using GridSearchCV , a library function which is a part of sklearn’s model selection package. After tuning, the performance of both these algorithms increased and Linear SVC gave a better accuracy score of 99%. Hence Linear SVC is being used in this project for predicting real and fake job posting on a job portal..

Downloads

Download data is not yet available.

Article Details

How to Cite
Sherina Sara Jaison, & Mallikarjuna Kodabagi. (2023). Identifying Real and Fake Job Posting using Machine Learning . Journal of Advanced Zoology, 44(S6), 622–627. https://doi.org/10.17762/jaz.v44iS6.2266
Section
Articles