Dealing with the Class Imbalance Problem in the Detection of Fake Job Descriptions

In recent years, the detection of fake job descriptions has become increasingly necessary because social networking has changed the way people access burgeoning information in the internet age. Identifying fraud in job descriptions can help jobseekers to avoid many of the risks of job hunting. Howev...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computers, materials & continua materials & continua, 2021-01, Vol.68 (1), p.521-535
Hauptverfasser: Thanh Vo, Minh, H. Vo, Anh, Nguyen, Trang, Sharma, Rohit, Le, Tuong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 535
container_issue 1
container_start_page 521
container_title Computers, materials & continua
container_volume 68
creator Thanh Vo, Minh
H. Vo, Anh
Nguyen, Trang
Sharma, Rohit
Le, Tuong
description In recent years, the detection of fake job descriptions has become increasingly necessary because social networking has changed the way people access burgeoning information in the internet age. Identifying fraud in job descriptions can help jobseekers to avoid many of the risks of job hunting. However, the problem of detecting fake job descriptions comes up against the problem of class imbalance when the number of genuine jobs exceeds the number of fake jobs. This causes a reduction in the predictability and performance of traditional machine learning models. We therefore present an efficient framework that uses an oversampling technique called FJD-OT (Fake Job Description Detection Using Oversampling Techniques) to improve the predictability of detecting fake job descriptions. In the proposed framework, we apply several techniques including the removal of stop words and the use of a tokenizer to preprocess the text data in the first module. We then use a bag of words in combination with the term frequency-inverse document frequency (TF-IDF) approach to extract the features from the text data to create the feature dataset in the second module. Next, our framework applies k-fold cross-validation, a commonly used technique to test the effectiveness of machine learning models, that splits the experimental dataset [the Employment Scam Aegean (ESA) dataset in our study] into training and test sets for evaluation. The training set is passed through the third module, an oversampling module in which the SVMSMOTE method is used to balance data before training the classifiers in the last module. The experimental results indicate that the proposed approach significantly improves the predictability of fake job description detection on the ESA dataset based on several popular performance metrics.
doi_str_mv 10.32604/cmc.2021.015645
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2507804829</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2507804829</sourcerecordid><originalsourceid>FETCH-LOGICAL-c313t-12c29124699231a89c70c12e9a1fc78dee503b69a101d63bc8aec8cd3656d02e3</originalsourceid><addsrcrecordid>eNpNkDFPwzAQhS0EEqWwM1piTjnbiRuPqKVQVAmEYLacy4WmJHGxUyH-PWnLwHTv3j29kz7GrgVMlNSQ3mKLEwlSTEBkOs1O2EhkqU6klPr0nz5nFzFuAJRWBkbsdU6uqbsP_l33a96vic8aFyNftoVrXIfEX4IvGmp53R3Oc-oJ-9p33Fd84T6JP_licCOGerv34yU7q1wT6epvjtn74v5t9pisnh-Ws7tVgkqoPhESpREy1cZIJVxucAooJBknKpzmJVEGqtDDCqLUqsDcEeZYKp3pEiSpMbs59m6D_9pR7O3G70I3vLQyg2kOaS7NkIJjCoOPMVBlt6FuXfixAuyBnB3I2T05eySnfgGHCGAQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2507804829</pqid></control><display><type>article</type><title>Dealing with the Class Imbalance Problem in the Detection of Fake Job Descriptions</title><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Thanh Vo, Minh ; H. Vo, Anh ; Nguyen, Trang ; Sharma, Rohit ; Le, Tuong</creator><creatorcontrib>Thanh Vo, Minh ; H. Vo, Anh ; Nguyen, Trang ; Sharma, Rohit ; Le, Tuong</creatorcontrib><description>In recent years, the detection of fake job descriptions has become increasingly necessary because social networking has changed the way people access burgeoning information in the internet age. Identifying fraud in job descriptions can help jobseekers to avoid many of the risks of job hunting. However, the problem of detecting fake job descriptions comes up against the problem of class imbalance when the number of genuine jobs exceeds the number of fake jobs. This causes a reduction in the predictability and performance of traditional machine learning models. We therefore present an efficient framework that uses an oversampling technique called FJD-OT (Fake Job Description Detection Using Oversampling Techniques) to improve the predictability of detecting fake job descriptions. In the proposed framework, we apply several techniques including the removal of stop words and the use of a tokenizer to preprocess the text data in the first module. We then use a bag of words in combination with the term frequency-inverse document frequency (TF-IDF) approach to extract the features from the text data to create the feature dataset in the second module. Next, our framework applies k-fold cross-validation, a commonly used technique to test the effectiveness of machine learning models, that splits the experimental dataset [the Employment Scam Aegean (ESA) dataset in our study] into training and test sets for evaluation. The training set is passed through the third module, an oversampling module in which the SVMSMOTE method is used to balance data before training the classifiers in the last module. The experimental results indicate that the proposed approach significantly improves the predictability of fake job description detection on the ESA dataset based on several popular performance metrics.</description><identifier>ISSN: 1546-2226</identifier><identifier>ISSN: 1546-2218</identifier><identifier>EISSN: 1546-2226</identifier><identifier>DOI: 10.32604/cmc.2021.015645</identifier><language>eng</language><publisher>Henderson: Tech Science Press</publisher><subject>Datasets ; Descriptions ; Feature extraction ; Fraud ; Internet ; Job descriptions ; Job hunting ; Machine learning ; Modules ; Oversampling ; Performance measurement ; Training</subject><ispartof>Computers, materials &amp; continua, 2021-01, Vol.68 (1), p.521-535</ispartof><rights>2021. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c313t-12c29124699231a89c70c12e9a1fc78dee503b69a101d63bc8aec8cd3656d02e3</citedby><cites>FETCH-LOGICAL-c313t-12c29124699231a89c70c12e9a1fc78dee503b69a101d63bc8aec8cd3656d02e3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Thanh Vo, Minh</creatorcontrib><creatorcontrib>H. Vo, Anh</creatorcontrib><creatorcontrib>Nguyen, Trang</creatorcontrib><creatorcontrib>Sharma, Rohit</creatorcontrib><creatorcontrib>Le, Tuong</creatorcontrib><title>Dealing with the Class Imbalance Problem in the Detection of Fake Job Descriptions</title><title>Computers, materials &amp; continua</title><description>In recent years, the detection of fake job descriptions has become increasingly necessary because social networking has changed the way people access burgeoning information in the internet age. Identifying fraud in job descriptions can help jobseekers to avoid many of the risks of job hunting. However, the problem of detecting fake job descriptions comes up against the problem of class imbalance when the number of genuine jobs exceeds the number of fake jobs. This causes a reduction in the predictability and performance of traditional machine learning models. We therefore present an efficient framework that uses an oversampling technique called FJD-OT (Fake Job Description Detection Using Oversampling Techniques) to improve the predictability of detecting fake job descriptions. In the proposed framework, we apply several techniques including the removal of stop words and the use of a tokenizer to preprocess the text data in the first module. We then use a bag of words in combination with the term frequency-inverse document frequency (TF-IDF) approach to extract the features from the text data to create the feature dataset in the second module. Next, our framework applies k-fold cross-validation, a commonly used technique to test the effectiveness of machine learning models, that splits the experimental dataset [the Employment Scam Aegean (ESA) dataset in our study] into training and test sets for evaluation. The training set is passed through the third module, an oversampling module in which the SVMSMOTE method is used to balance data before training the classifiers in the last module. The experimental results indicate that the proposed approach significantly improves the predictability of fake job description detection on the ESA dataset based on several popular performance metrics.</description><subject>Datasets</subject><subject>Descriptions</subject><subject>Feature extraction</subject><subject>Fraud</subject><subject>Internet</subject><subject>Job descriptions</subject><subject>Job hunting</subject><subject>Machine learning</subject><subject>Modules</subject><subject>Oversampling</subject><subject>Performance measurement</subject><subject>Training</subject><issn>1546-2226</issn><issn>1546-2218</issn><issn>1546-2226</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNpNkDFPwzAQhS0EEqWwM1piTjnbiRuPqKVQVAmEYLacy4WmJHGxUyH-PWnLwHTv3j29kz7GrgVMlNSQ3mKLEwlSTEBkOs1O2EhkqU6klPr0nz5nFzFuAJRWBkbsdU6uqbsP_l33a96vic8aFyNftoVrXIfEX4IvGmp53R3Oc-oJ-9p33Fd84T6JP_licCOGerv34yU7q1wT6epvjtn74v5t9pisnh-Ws7tVgkqoPhESpREy1cZIJVxucAooJBknKpzmJVEGqtDDCqLUqsDcEeZYKp3pEiSpMbs59m6D_9pR7O3G70I3vLQyg2kOaS7NkIJjCoOPMVBlt6FuXfixAuyBnB3I2T05eySnfgGHCGAQ</recordid><startdate>20210101</startdate><enddate>20210101</enddate><creator>Thanh Vo, Minh</creator><creator>H. Vo, Anh</creator><creator>Nguyen, Trang</creator><creator>Sharma, Rohit</creator><creator>Le, Tuong</creator><general>Tech Science Press</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope></search><sort><creationdate>20210101</creationdate><title>Dealing with the Class Imbalance Problem in the Detection of Fake Job Descriptions</title><author>Thanh Vo, Minh ; H. Vo, Anh ; Nguyen, Trang ; Sharma, Rohit ; Le, Tuong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c313t-12c29124699231a89c70c12e9a1fc78dee503b69a101d63bc8aec8cd3656d02e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Datasets</topic><topic>Descriptions</topic><topic>Feature extraction</topic><topic>Fraud</topic><topic>Internet</topic><topic>Job descriptions</topic><topic>Job hunting</topic><topic>Machine learning</topic><topic>Modules</topic><topic>Oversampling</topic><topic>Performance measurement</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Thanh Vo, Minh</creatorcontrib><creatorcontrib>H. Vo, Anh</creatorcontrib><creatorcontrib>Nguyen, Trang</creatorcontrib><creatorcontrib>Sharma, Rohit</creatorcontrib><creatorcontrib>Le, Tuong</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>Computers, materials &amp; continua</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Thanh Vo, Minh</au><au>H. Vo, Anh</au><au>Nguyen, Trang</au><au>Sharma, Rohit</au><au>Le, Tuong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Dealing with the Class Imbalance Problem in the Detection of Fake Job Descriptions</atitle><jtitle>Computers, materials &amp; continua</jtitle><date>2021-01-01</date><risdate>2021</risdate><volume>68</volume><issue>1</issue><spage>521</spage><epage>535</epage><pages>521-535</pages><issn>1546-2226</issn><issn>1546-2218</issn><eissn>1546-2226</eissn><abstract>In recent years, the detection of fake job descriptions has become increasingly necessary because social networking has changed the way people access burgeoning information in the internet age. Identifying fraud in job descriptions can help jobseekers to avoid many of the risks of job hunting. However, the problem of detecting fake job descriptions comes up against the problem of class imbalance when the number of genuine jobs exceeds the number of fake jobs. This causes a reduction in the predictability and performance of traditional machine learning models. We therefore present an efficient framework that uses an oversampling technique called FJD-OT (Fake Job Description Detection Using Oversampling Techniques) to improve the predictability of detecting fake job descriptions. In the proposed framework, we apply several techniques including the removal of stop words and the use of a tokenizer to preprocess the text data in the first module. We then use a bag of words in combination with the term frequency-inverse document frequency (TF-IDF) approach to extract the features from the text data to create the feature dataset in the second module. Next, our framework applies k-fold cross-validation, a commonly used technique to test the effectiveness of machine learning models, that splits the experimental dataset [the Employment Scam Aegean (ESA) dataset in our study] into training and test sets for evaluation. The training set is passed through the third module, an oversampling module in which the SVMSMOTE method is used to balance data before training the classifiers in the last module. The experimental results indicate that the proposed approach significantly improves the predictability of fake job description detection on the ESA dataset based on several popular performance metrics.</abstract><cop>Henderson</cop><pub>Tech Science Press</pub><doi>10.32604/cmc.2021.015645</doi><tpages>15</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1546-2226
ispartof Computers, materials & continua, 2021-01, Vol.68 (1), p.521-535
issn 1546-2226
1546-2218
1546-2226
language eng
recordid cdi_proquest_journals_2507804829
source Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects Datasets
Descriptions
Feature extraction
Fraud
Internet
Job descriptions
Job hunting
Machine learning
Modules
Oversampling
Performance measurement
Training
title Dealing with the Class Imbalance Problem in the Detection of Fake Job Descriptions
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T10%3A14%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Dealing%20with%20the%20Class%20Imbalance%20Problem%20in%20the%20Detection%20of%20Fake%20Job%20Descriptions&rft.jtitle=Computers,%20materials%20&%20continua&rft.au=Thanh%20Vo,%20Minh&rft.date=2021-01-01&rft.volume=68&rft.issue=1&rft.spage=521&rft.epage=535&rft.pages=521-535&rft.issn=1546-2226&rft.eissn=1546-2226&rft_id=info:doi/10.32604/cmc.2021.015645&rft_dat=%3Cproquest_cross%3E2507804829%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2507804829&rft_id=info:pmid/&rfr_iscdi=true